Scaling elastic cloud clusters
Because Kubernetes scales only on CPU metrics, Boomi leveraged the use of KEDA to scale on custom application metrics. By doing this, cluster nodes no longer scale with execution activity, but rather, we can scale the workers on the number of executions, allowing them to scale to 0 when the Minimum Execution Worker quota is set to 0.
How does the Kubernetes cluster autoscale?
Cluster autoscaling is highly dependent on the Kubernetes platform and involves two primary layers:
- Pod Autoscaling: In an elastic runtime, execution workers will dynamically scale at the pod level. This is managed by a Horizontal Pod Autoscaler (HPA), which automatically adjusts the number of pod replicas based on observed CPU and memory utilization, as well as the number of executions..
- Node Autoscaling: Nodes are the underlying machines that back the Kubernetes cluster. As the execution workload increases, the existing nodes may no longer have sufficient resources to accommodate the new pods. At this point, node autoscaling becomes necessary to provision additional nodes, ensuring the cluster has enough capacity for the growing workload. Conversely, if the workload decreases, nodes will be deprovisioned, leading to cost savings.
Scaling workers up
The Minimum Execution Worker quota governs the minimum number of replicas (workers). In the same manner, the Execution Worker quota shapes the maximum number of replicas.
These two boundaries condition the cluster to scale horizontally based on the following metrics:
- CPU
- Memory
- Number of executions
The CPU’s default resource-based scaling value is 75%, while the memory’s value is 90%. For application-based scaling, the execution worker’s Maximum Running Executions quota is used.
The Worker Elastic Scaling Threshold defines a percentage of the existing Worker Maximum Running Executions quota. If that threshold is exceeded, the autoscaler will start additional workers. For example, if the Scaling Threshold quota is set to 80% and the Maximum Running Executions quota is 10, then when 8 concurrent executions are running, an additional worker will be started. When there are multiple workers, the scaling quota is compared to the average number of executions across all of the workers. So in the previous example where the scaling threshold was 80% of the 10 maximum executions, if there were two workers, an additional worker would be started when there are 16 concurrent executions (80% of a possible 20 executions).
The default value for this cloud quota is 75%. Executions will be queued if the number of running executions exceeds the execution worker Maximum Running Executions quota.
This quota can be lowered to more aggressively scale out the workers based on the number of incoming execution requests.
When a scaling threshold is met, the worker replicas use the algorithm below to scale by the Horizontal pod Autoscaler.
<desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]>
A Horizontal pod autoscaler is a Kubernetes feature that updates the workload resource (like deployments), with the purpose of automatically scaling the workload to match the given demand. Meaning, the response is to increase the workload to deploy more pods. This is different from vertical scaling in Kubernetes, which means to assign more resources, like memory, to the pods already running a workload.
Scaling workers down
The scaling of the workers between 1 replica and the max replicas (defined by the execution workers quota) is handled by the Horizontal Pod Autoscaler algorithm.
If the Minimum execution workers quota is set to 0, the cluster will remove all workers. Workers will scale in from 0 to 1 only if all the metrics for scaling are below the threshold for 5 minutes. A new worker will automatically start upon receiving a new execution request.
You can use the Amazon Elastic Kubernetes service, Azure Kubernetes service, or Google Kubernetes Engine to autoscale nodes.
More information about node autoscaling refer to the Node autoscalling topic in the Kubernetes documentation site
You can also use Karpenter to autoscale any Kubernetes cluster. Karpenter is an-open source node lifecycle management service built specifically for Kubernetes. This improves the efficiency of running workloads on a cluster.